AITopics | intrinsic lr

Recent works (e.g., (Li and Arora, 2020)) suggest that the use of popular normalization schemes (including Batch Normalization) in today's deep learning can

equilibrium, initialization, intrinsic lr, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Add feedback

a7453a5f026fb6831d68bdc9cb0edcae-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 15:33:06 GMT

equilibrium, international conference, learning rate, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

a7453a5f026fb6831d68bdc9cb0edcae-AuthorFeedback.pdf

Neural Information Processing SystemsAug-15-2025, 15:32:54 GMT

batch size, reviewer, weight norm, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.74)

Add feedback

Mismatches between Traditional Optimization Analyses and Modern Deep Learning

#artificialintelligenceNov-28-2020, 12:58:44 GMT

You may remember our previous blog post showing that it is possible to do state-of-the-art deep learning with learning rate that increases exponentially during training. It was meant to be a dramatic illustration that what we learned in optimization classes and books isn't always a good fit for modern deep learning, specifically, normalized nets, which is our term for nets that use any one of popular normalization schemes,e.g. Today's post (based upon our paper with Kaifeng Lyu at NeurIPS20) identifies other surprising incompatibilities between normalized nets and traditional analyses. We hope this will change the way you teach and think about deep learning! Before diving into the results, we recall that normalized nets are typically trained with weight decay (aka $\ell_2$ regularization).

generalization, mathcal, sgd, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback